# clean up workspace environment
rm(list = ls())
# all packages used for the assignment
library(mosaicData)
library(DataComputing)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(lubridate)
file_location <- file.choose()
us_youtube_trend <- read.csv(file_location) %>%
mutate(country = "US")
file_location <- file.choose()
ca_youtube_trend <- read.csv(file_location) %>%
mutate(country = "CA")
youtube_trend <- rbind(us_youtube_trend, ca_youtube_trend)
Getting data from “USvideos.csv” and “CAvideos.csv” which is given by Mitchell J in Kaggle. The kaggle link is following: https://www.kaggle.com/datasnaek/youtube-new\. The data was originally collected for data sharing in Kaggle webpage about 2 years ago. Each case represent one trending YouTube video. For example, there is a row about YouTube video posted by Eminem. It was treding on Nov 14, 2017 and had views of 17158579, likes of 787425, dislikes of 43420, and comments of 125882. In our dataset, there are 40926 rows of trending YouTube videos. I plan to use variables “trending_date”, “views”, “likes”, “dislikes”, “comment_count”, and “comment_disabled”. The region is US AND Canada.
The six example rows are shown below
head(youtube_trend)